The techniques described in this chapter for summarizing, graphing, and comparing survival data deal

with the time interval from a defined starting point to the first occurrence of an endpoint event. The

event can be designated as death or a relapse of a particular condition, such as a recurrence of cancer.

Or you could designate the event to be surgical removal (called an explant) of a failed mechanical

component, such as an artificial heart valve. If a patient’s heart valve was implanted on January 10

(beginning of time interval), but their body rejected it and the explant took place on January 30 (time of

event), then the time interval from implant to explant is 30 – 10, or 20 days.

A person can die only once, so survival analysis can obviously be used for one-time events. But other

endpoints can occur multiple times, such as having a stroke or having cancer go into remission. The

techniques we describe in this chapter only analyze time to the first occurrence of the event. More

advanced survival analysis methods are needed for models that can handle multiple occurrences of an

event, and these are beyond the scope of this book.

The starting point of the time interval is somewhat arbitrary, so it must be defined explicitly

every time you do a survival analysis. Imagine that you’re studying the progression of chronic

obstructive pulmonary disease (COPD) in a group of patients. If you want to study the natural

history of the disease, the starting point can be the diagnosis date. But if you’re instead interested

in evaluating the efficacy of a treatment, the starting point can be defined as the date the treatment

began.

Recognizing that survival times aren’t normally distributed

Even though survival times are numerical quantities, they’re almost never normally

distributed. Because of this, it’s generally not a good idea to use the following:

Means and standard deviations to describe survival times

T tests and ANOVAs to compare survival times between groups

Least-squares regression to investigate how survival time is influenced by other factors

If non-normality were the only problem with survival data, you’d be able to summarize survival times

as medians and centiles instead of means and standard deviations. Also, you could compare survival

between groups with nonparametric Mann-Whitney and Kruskal-Wallis tests instead of t tests and

ANOVAs. But time-to-event data are susceptible to a specific type of missingness called censoring.

Typical parametric and nonparametric regression methods are not equipped to deal with censoring, so

we present survival analysis techniques in this chapter.

Considering censoring

Survival data are defined as the time interval between a selected starting point and an endpoint that

represents an event. But unfortunately, the time the event takes place can be missing in survival data.

This can happen in two general ways: